-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add e2e test for queue::fill with a range of pattern sizes #15991
base: sycl
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Hi @rafbiels - I'd like to merge the UR side of this soon. Could you rebase this PR and ensure the CI is run? If it passes I'll merge the UR PR and update the tag here. |
3f1cb46
to
56e7eb4
Compare
Hi @callumfare, I rebased both the UR side (onto the current tag used in intel/llvm) and here (onto HEAD of sycl branch) to keep the branches consistent. Feel free to launch the CI as I don't have the permission to do so. |
@rafbiels Thanks, I ran the CI but it looks like the new test failed on Windows |
I'm wrong, it's much more trivial (and annoying) than that. Both platforms use |
The failure should be fixed now, please restart the CI when possible |
@callumfare looks like the UR change was collected into intel/llvm as part of #16040, but I would still like the new e2e test here to be merged. Shall I rebase and update this PR to just add the test now (no UR tag change)? |
…e::fill Update the UR tag to fix queue::fill for the CUDA and HIP backends, which was previously producing incorrect outputs for any pattern size other than 1, 2, or a multiple of 4 bytes. A new optimisation is also added which speeds up the fill greatly if the pattern equals to the first word repeated throughout (e.g. all zeros). Add a new e2e test to validate queue::fill outputs for any pattern size between 1 and 32 bytes. This test fails for CUDA and HIP before the UR change and passes with this PR. Other backends already worked correctly.
1910c7c
to
e86712f
Compare
@intel/llvm-gatekeepers could I have a re-run of the CI here and potentially a re-approval please? The code hasn't changed since the last approval. The only difference is that this PR is no longer updating the unified-runtime tag because the corresponding UR change was already merged elsewhere. It's just the e2e test now. |
I'm looking into the failure on 12th-gen CPU with OpenCL: |
The issue is tracked in oneapi-src/unified-runtime#2440 and I added a corresponding XFAIL statement for the test. @intel/llvm-gatekeepers, may I ask for another (hopefully final) CI run here? |
Update the UR tag to include oneapi-src/unified-runtime#2273 fixingqueue::fill
for the CUDA and HIP backends. It was previously producing incorrect outputs for any pattern size other than 1, 2, or a multiple of 4 bytes. A new optimisation is also added which speeds up the fill greatly if the pattern equals to the first word repeated throughout (e.g. all zeros). See the UR PR for more details.The UR tag update was collected in #16040 so now this PR only adds an e2e test as stated below.
Add a new e2e test to validate
queue::fill
outputs for any pattern size between 1 and 32 bytes. This test fails for CUDA and HIP before the UR change and passes with this PR. Other backends already worked correctly.